Generated: /root/verbalize-sampling
Experiments: 5
Metrics: diversity, ngram, joke_quality
| Name | Task | Method | Model | Responses | Temperature |
|---|---|---|---|---|---|
| direct (samples=1) | joke | direct | openai/gpt-4.1 | 30 | 0.7 |
| sequence [strict] (samples=5) | joke | sequence | openai/gpt-4.1 | 30 | 0.7 |
| multi_turn [strict] (samples=5) | joke | multi_turn | openai/gpt-4.1 | 30 | 0.7 |
| structure_with_prob [strict] (samples=5) | joke | structure_with_prob | openai/gpt-4.1 | 30 | 0.7 |
| combined [strict] (samples=5) | joke | combined | openai/gpt-4.1 | 30 | 0.7 |
Tell me a joke.
Why donât skeletons fight each other? They donât have the guts!
Tell me a joke.
Why don't skeletons fight each other? They don't have the guts.
Tell me a joke.
Why did the scarecrow win an award? Because he was outstanding in his field!
Tell me a joke.
Why did the scarecrow win an award? Because he was outstanding in his field!
Tell me a joke.
Why did the scarecrow win an award? Because he was outstanding in his field!
Tell me a joke.
Why did the scarecrow win an award? Because he was outstanding in his field!
Tell me a joke.
Why donât skeletons fight each other? They donât have the guts.
Tell me a joke.
Why did the scarecrow win an award? Because he was outstanding in his field!
Tell me a joke.
Why did the math book look sad? Because it had too many problems.
Tell me a joke.
Why don't skeletons fight each other? They don't have the guts.
Tell me a joke.
Why don't skeletons fight each other? They don't have the guts.
Tell me a joke.
Why don't skeletons fight each other? They don't have the guts.
Give me a witty joke that's also clever
Why did the mathematician refuse to argue with the obtuse triangle? Because he knew it was never right.
Give me a witty joke that's also clever
Why did the scarecrow become a successful motivational speaker? Because he was outstanding in his field and always knew how to plant good ideas in peopleâs heads. He even gave a TED talk called "From Straw to Awe: Harvesting Your Potential." Attendees said his advice was a-maize-ing, and he left everyone in stitchesâthough that might have just been from the wind. Now, whenever people need a little inspiration, they say, "Letâs get some advice from the guy who knows how to weather any storm and still stand tall!"
Give me a witty joke that's also clever
Did you hear about the linguist who was accused of being too possessive? He always wanted to have the last word. His friends tried to intervene, but every time they suggested a compromise, heâd interject with, âActually, the etymology of that term isââ and proceed to give a 20-minute lecture. Eventually, they realized arguing with him was like participating in a never-ending spelling bee, except the only word spelled out was D-E-B-A-T-E. Some say he found true love with a grammarianâthey now spend their evenings diagramming sentences and correcting each otherâs clauses. At their wedding, the vows were in perfect subjunctive, and the guests said it was the most articulate ceremony theyâd ever attended. So, if you ever want to win an argument with a linguist, just remember: silence is golden, but puns are silver.
| Experiment | Average Diversity | Min Diversity | Max Diversity | Std Diversity | Average Response Length | Average Unique Words | Average Vocabulary Richness | Total Cost |
|---|---|---|---|---|---|---|---|---|
| direct (samples=1) | 0.1008 | 0.0000 | 0.4369 | 0.1628 | 12.1800 | 11.9600 | 0.9840 | 0.0000 |
| sequence [strict] (samples=5) | 0.3023 | 0.0000 | 0.4995 | 0.1197 | 13.2933 | 12.9600 | 0.9778 | 0.0001 |
| multi_turn [strict] (samples=5) | 0.2824 | 0.0000 | 0.4763 | 0.1249 | 12.6667 | 12.4400 | 0.9839 | 0.0000 |
| structure_with_prob [strict] (samples=5) | 0.3086 | 0.0000 | 0.4857 | 0.1185 | 13.0133 | 12.6733 | 0.9764 | 0.0001 |
| combined [strict] (samples=5) | 0.2729 | 0.0000 | 0.4708 | 0.0913 | 96.5611 | 75.1111 | 0.8309 | 0.0005 |
| Experiment | Average Rouge L | Min Rouge L | Max Rouge L | Std Rouge L | Average Response Length |
|---|---|---|---|---|---|
| direct (samples=1) | 0.7414 | 0.0000 | 1.0000 | 0.3511 | 12.1800 |
| sequence [strict] (samples=5) | 0.1827 | 0.0000 | 1.0000 | 0.2613 | 13.2933 |
| multi_turn [strict] (samples=5) | 0.2487 | 0.0000 | 1.0000 | 0.2806 | 12.6667 |
| structure_with_prob [strict] (samples=5) | 0.1701 | 0.0000 | 1.0000 | 0.2614 | 13.0133 |
| combined [strict] (samples=5) | 0.1058 | 0.0000 | 1.0000 | 0.1077 | 96.5611 |
| Experiment | Funniness | Cleverness | Originality | Structure | Overall | Normalized Overall |
|---|---|---|---|---|---|---|
| direct (samples=1) | 3.4933 | 3.5600 | 2.2600 | 4.0867 | 3.3523 | 0.6705 |
| sequence [strict] (samples=5) | 3.3600 | 3.8600 | 2.4733 | 4.0933 | 3.4177 | 0.6835 |
| multi_turn [strict] (samples=5) | 3.2533 | 3.6400 | 2.3333 | 4.1200 | 3.2960 | 0.6592 |
| structure_with_prob [strict] (samples=5) | 3.4267 | 3.8800 | 2.4467 | 4.1800 | 3.4570 | 0.6914 |
| combined [strict] (samples=5) | 3.3333 | 4.0389 | 3.4111 | 3.6833 | 3.5778 | 0.7156 |